Search CORE

eScholarship - University of California

TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics

Author: B Ludaescher
Christopher A. Dell
CS Parr
DE Soltis
DR Maddison
DS Carneiro-Torres
E Pennisi
ES Lander
Greg H. Traub
H-J Esser
HD Zhimin W
J Wieczorek
JC Venter
Jin Koh
M Gross
MA O'Leary
MB Jones
Nestor Santiago
Nico Cellinese
PH Pahlevani
RA Vos
Reed S. Beaman
Robert DeSalle
SD Kahn
T Oinn
TJ Vision
WK Michener
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The development of biological informatics infrastructure capable of supporting growing data management and analysis environments is an increasing need within the systematics biology community. Although significant progress has been made in recent years on developing new algorithms and tools for analyzing and visualizing large phylogenetic data and trees, implementation of these resources is often carried out by bioinformatics experts, using one-off scripts. Therefore, a gap exists in providing data management support for a large set of non-technical users. The TOLKIN project (Tree of Life Knowledge and Information Network) addresses this need by supporting capabilities to manage, integrate, and provide public access to molecular, morphological, and biocollections data and research outcomes through a collaborative, web application. This data management framework allows aggregation and import of sequences, underlying documentation about their source, including vouchers, tissues, and DNA extraction. It combines features of LIMS and workflow environments by supporting management at the level of individual observations, sequences, and specimens, as well as assembly and versioning of data sets used in phylogenetic inference. As a web application, the system provides multi-user support that obviates current practices of sharing data sets as files or spreadsheets via email

CiteSeerX

Public Library of Science (PLOS)

Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map

Author: Alan Christoffels
B Sobrino
D Hernandez
D Jasper G Rees
Daniel J Sargent
DJ Sargent
DJ Sargent
DR Zerbino
GAL Broggini
H Li
J Butler
Jean-Marc Celton
JJ Doyle
KM Folta
R De La Rosa
R Ming
S DiGuistini
S Huang
The Bovine Genome Sequencing and Analysis Consortium
TJ Vision
Vladimir Shulaev
W Howad
WE MacHardy
X Xu
Xiangming Xu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results The strategy was tested on a draft genome of the fungal pathogen <it>Venturia inaequalis</it>, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome <it>Fragaria vesca</it>. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for <it>V. inaequalis </it>and <it>F. vesca</it>, respectively, to genetic linkage maps. Conclusions We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.</p

University of the Western Cape Research Repository

Genome-Wide Distribution and Organization of Microsatellites in Plants: An Insight into Marker Development in Brachypodium

Author: AH Paterson
Anshul Sharma
B Harr
C Schlötterer
DE Riley
Deepak K. Gupta
DF Garvin
DF Garvin
G Blanc
G Levinson
GA Tuskan
GD Schuler
H Singh
H Sonah
Humira Sonah
Initiative Arabidopsis Genome
Initiative The International Brachypodium
J Draper
J Vogel
J Vogel
J Yu
Jai C. Rana
JC Whittaker
JL Weber
JM Hancock
K Chabane
KH Wolfe
MC Saha
Nagendra K. Singh
P Modrich
PK Gupta
R Chakraborty
Raju N. Gacche
Rupesh K. Deshmukh
S Li
S Temnykh
SA Goff
SC Alves
SK Parida
SK Parida
SR McCouch
T Asp
Tilak R. Sharma
TJ Anderson
TJ Vision
Vinay P. Singh
W Powell
X Xu
YG Cho
Ying Xu
Publication venue: Public Library of Science
Publication date: 21/06/2011
Field of study

Plant genomes are complex and contain large amounts of repetitive DNA including microsatellites that are distributed across entire genomes. Whole genome sequences of several monocot and dicot plants that are available in the public domain provide an opportunity to study the origin, distribution and evolution of microsatellites, and also facilitate the development of new molecular markers. In the present investigation, a genome-wide analysis of microsatellite distribution in monocots (Brachypodium, sorghum and rice) and dicots (Arabidopsis, Medicago and Populus) was performed. A total of 797,863 simple sequence repeats (SSRs) were identified in the whole genome sequences of six plant species. Characterization of these SSRs revealed that mono-nucleotide repeats were the most abundant repeats, and that the frequency of repeats decreased with increase in motif length both in monocots and dicots. However, the frequency of SSRs was higher in dicots than in monocots both for nuclear and chloroplast genomes. Interestingly, GC-rich repeats were the dominant repeats only in monocots, with the majority of them being present in the coding region. These coding GC-rich repeats were found to be involved in different biological processes, predominantly binding activities. In addition, a set of 22,879 SSR markers that were validated by e-PCR were developed and mapped on different chromosomes in Brachypodium for the first time, with a frequency of 101 SSR markers per Mb. Experimental validation of 55 markers showed successful amplification of 80% SSR markers in 16 Brachypodium accessions. An online database ‘BraMi’ (Brachypodium microsatellite markers) of these genome-wide SSR markers was developed and made available in the public domain. The observed differential patterns of SSR marker distribution would be useful for studying microsatellite evolution in a monocot–dicot system. SSR markers developed in this study would be helpful for genomic studies in Brachypodium and related grass species, especially for the map based cloning of the candidate gene(s)

Public Library of Science (PLOS)

Lund University Publications

Identification of conserved gene clusters in multiple genomes based on synteny and homology

Author: A Alexeyenko
A Bergeron
A Bergeron
AK Bansal
Anasua Sarkar
B Snel
D Goldberg
DJ Sherman
EA Housworth
FA Kondrashov
G Consortium
G Didier
Hayssam Soueidan
J Tamames
JH Nadeau
K Vandepoele
KH Wolfe
L Li
L Parida
M Ermolaeva
M Lynch
M Lynch
M Nikolski
Macha Nikolski
ML Seret
MP Beal
Q Yang
R Hoberman
R Jothi
R Overbeek
S Heber
S Kim
S Ohno
T Dandekar
T Schmidt
TJ Vision
W Fitch
WM Fitch
X He
Publication venue: BioMed Central
Publication date: 01/10/2011
Field of study

Abstract Background Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on <it>gene teams</it> and <it>homology teams.</it> Results Our primary contribution is a local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum. Conclusions The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.</p

Selection of a core set of RILs from Forrest × Williams 82 to develop a framework map in soybean

Author: C Lister
C Wu
DA Lightfoot
David A. Sleper
DJ Sargent
DL Hyten
FM You
H Hisano
Henry T. Nguyen
IY Choi
J Fan
J Gai
J Schmutz
J. Grover Shannon
JB Yan
JC Nelson
JE Frelichowski
Jill A. Leroy
JL Shultz
JL Shultz
K Yang
M Schuelke
MA Kassem
N Sharopova
PB Cregan
QJ Song
RC Shoemaker
SA Jackson
T Thiel
TD Vuong
TJ Vision
Tri D. Vuong
TY Hwang
W Howad
X Wu
Xiaolei Wu
XL Wu
XL Wu
Y Han
Z Xia
Publication venue: Springer-Verlag
Publication date: 01/01/2011
Field of study

Soybean BAC-based physical maps provide a useful platform for gene and QTL map-based cloning, EST mapping, marker development, genome sequencing, and comparative genomic research. Soybean physical maps for “Forrest” and “Williams 82” representing the southern and northern US soybean germplasm base, respectively, have been constructed with different fingerprinting methods. These physical maps are complementary for coverage of gaps on the 20 soybean linkage groups. More than 5,000 genetic markers have been anchored onto the Williams 82 physical map, but only a limited number of markers have been anchored to the Forrest physical map. A mapping population of Forrest × Williams 82 made up of 1,025 F8 recombinant inbred lines (RILs) was used to construct a reference genetic map. A framework map with almost 1,000 genetic markers was constructed using a core set of these RILs. The core set of the population was evaluated with the theoretical population using equality, symmetry and representativeness tests. A high-resolution genetic map will allow integration and utilization of the physical maps to target QTL regions of interest, and to place a larger number of markers into a map in a more efficient way using a core set of RILs

A pipeline for high throughput detection and mapping of SNPs from EST databases

Author: A Ching
A Rafalski
A-C Syvanen
A. M. Anithakumari
AK Masouleh
BCY Collard
Ben Vosman
C Schlotterer
C. Gerard van der Linden
D Milbourne
DJ Somers
DL Hyten
DL Wheeler
E Jacobsen
G Ablett
G Barker
G Jander
GT Bryan
GT Marth
Herman J. van Eck
HV van Os
IY Choi
J Tang
Jack A. M. Leunissen
JBOA Fan
Jifeng Tang
JS Werij
JW Ooijen Van
KL McNally
N Rostoks
N Rostoks
P Vos
PS Hanneman RE
R Sachidanandam
R Shen
RA Hoskins
RE Voorrips
Richard G. F. Visser
S Feingold
SF Altschul
TJ Vision
Y-J Shen
YL Zhu
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic variation that can be used as molecular markers. The SNPs that are hidden in sequence databases can be unlocked using bioinformatic tools. For efficient application of these SNPs, the sequence set should be error-free as much as possible, targeting single loci and suitable for the SNP scoring platform of choice. We have developed a pipeline to effectively mine SNPs from public EST databases with or without quality information using QualitySNP software, select reliable SNP and prepare the loci for analysis on the Illumina GoldenGate genotyping platform. The applicability of the pipeline was demonstrated using publicly available potato EST data, genotyping individuals from two diploid mapping populations and subsequently mapping the SNP markers (putative genes) in both populations. Over 7000 reliable SNPs were identified that met the criteria for genotyping on the GoldenGate platform. Of the 384 SNPs on the SNP array approximately 12% dropped out. For the two potato mapping populations 165 and 185 SNPs segregating SNP loci could be mapped on the respective genetic maps, illustrating the effectiveness of our pipeline for SNP selection and validation

Wageningen University & Research Publications

Analysis of high-identity segmental duplications in the grapevine genome

Abstract Background Segmental duplications (SDs) are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (<it>Vitis vinifera</it>) genome (PN40024). Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size) are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence). We detected mitochondrial and plastid DNA and genes (10% of gene annotation) in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the <it>Vitis vinifera </it>nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.</p

Archivio istituzionale della ricerca - Università di Bari

A physical map of Brassica oleracea shows complexity of chromosomal changes following recursive paleopolyploidizations

Author: A Kawabe
AE Yeager
AH Paterson
AH Paterson
AH Paterson
AH Paterson
Andrew H Paterson
B Hansson
Barry Marler
Bayram Yuksel
Beom-Seok Park
C Soderlund
CA Newell
Carl Rogers
Carlos F Quiros
CD Town
Christopher Town
CM O'Neill
Cornelia Lemke
D Babula
DE Soltis
DW Meinke
Emily Giattina
Ethan Epps
FL Iniguez-Luy
G Blanc
Gary Pierce
H Kuittinen
H Tang
HA Lewin
Heidi Sarazen
HJ Muller
IA Parkin
IAP Parkin
J Chris Pires
J Schmutz
JA Udall
JE Bowers
JE Bowers
Jennifer Ingles
Jeong-Hwan Mun
JH Mun
JH Mun
JH Mun
JL Bowman
John E Bowers
JR Wortman
JS Kim
KB Lim
L Lukens
LB Smith
Lifeng Lin
Lisa K Nelson
LR Detjen
M Ayele
M Kaczmarek
M Koch
M Trick
MA Koch
MA Lysak
MA Lysak
Manuel J Torres
MD Purugganan
ME Schranz
ME Schranz
ME Schranz
MR Thon
MS Pease
O Jaillon
R Ming
R Schmidt
Richard M Amasino
SA Kempin
Santhosh Karunakaran
SI Warwick
SR Choi
T Attia
T Currence
TH Lan
TH Lan
TH Lan
The Arabidopsis Genome Initiative
Thomas C Osborn
TJ Vision
TJ Yang
TJ Yang
W Martin
X Wang
X Wang
Xiyin Wang
Yongli Xiao
Young-Joo Seol
YW Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, <it>Arabidopsis thaliana</it>, provides means to explore their genomic complexity. Results A genome-wide physical map of a rapid-cycling strain of <it>B. oleracea </it>was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of <it>B. oleracea </it>and <it>Arabidopsis thaliana</it>, a relatively high level of genomic change since their divergence. Comparison of the <it>B. oleracea </it>physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity. Conclusions A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes. All the physical mapping data is freely shared at a WebFPC site (<url>http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/</url>; Temporarily password-protected: account: pgml; password: 123qwe123.</p